home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
Language/OS - Multiplatform Resource Library
/
LANGUAGE OS.iso
/
cocktail
/
doc.lha
/
doc.doc
/
scanex.doc
< prev
next >
Wrap
Text File
|
1992-09-25
|
43KB
|
1,888 lines
___________________________________________________________________
Selected Examples of
Scanner Specifications
J. Grosch
___________________________________________________________________
___________________________________________________________________
GESELLSCHAFT FUeR MATHEMATIK
UND DATENVERARBEITUNG MBH
FORSCHUNGSSTELLE FUeR
PROGRAMMSTRUKTUREN
AN DER UNIVERSITAeT KARLSRUHE
___________________________________________________________________
Project
Compiler Generation
___________________________________________________________
Selected Examples of Scanner Specifications
Josef Grosch
Mar. 8, 1988
___________________________________________________________
Report No. 7
Copyright c 1988 GMD
Gesellschaft fuer Mathematik und Datenverarbeitung mbH
Forschungsstelle an der Universitaet Karlsruhe
Vincenz-Priesznitz-Str. 1
D-7500 Karlsruhe
Scanner Specification 2
1. Introduction
Among the tokens to be recognized by scanners are a few that require non
trivial processing: comments, strings, and character constants. Even identif-
iers and keywords may cause some trouble if the language defines upper-case
and lower-case letters to have the same meaning. The problems with these
tokens are the following:
- maintaining the line count during tokens extending on several lines
- maintaining the column count during tokens containing tab characters
- computation of the source position of tokens extending on several lines
or of compound tokens which are recognized as a sequence of subtokens
- nested comments
- report unclosed strings and comments as errors
- computing the internal representation of strings
- conversion of escape sequences such as doubled string delimiters or
preceding escape characters
- normalization of upper-case and lower-case letters
The following chapters contain solutions to the above problems for the
languages Pascal, Modula, C, and Ada. The solutions are scanner specifications
suitable as input for the scanner generator Rex [Gro87]. The primary inten-
tion of this paper is to serve as a reference manual containing examples for
non trivial cases. All specifications use C as target language except the
chapter on Modula which uses Modula. The Appendix contains a complete scanner
specification for Ada with Modula as target language.
2. Pascal
2.1. Comments
Problems to solve:
- unclosed comments
- newline characters
- tab characters
Scanner Specification 3
Solution:
EOF {IF yyStartState = Comment THEN Error ("unclosed comment"); END;}
DEFINE CmtCh = - {*\}\t\n}.
START Comment
RULE
"(*" | "{" :- {yyStart (Comment);}
#Comment# "*)" | "}" :- {yyStart (STD);}
#Comment# "*" | CmtCh + :- {}
Comments are processed in a separate start state called Comment. Every-
thing is skipped in this state except closing comment brackets which switch
back to start state STD. The single characters '*' or '}' which can start a
closing comment bracket have to be skipped separately. Otherwise closing com-
ment brackets would not be recognized because of the "longest match" rule of
Rex. An unclosed comment is indicated by reaching end of file while in start
state Comment. We presuppose the existence of a procedure Error to report
this condition. We don't need to care about tab and newline characters other
than excluding them from the set CmtCh because the two rules needed for this
problem are already predefined by Rex:
#Comment# \t :- {yyTab;}
#Comment# \n :- {yyEol (0);}
2.2. Identifiers
Problems to solve:
- normalization of upper-case and lower-case letters
Scanner Specification 4
Solution:
EXPORT {
# include "Idents.h"
# include "Positions.h"
typedef struct {
tPosition Position;
tIdent Ident;
} tScanAttribute;
extern void ErrorAttribute ();
}
GLOBAL {
# define TokIdentifier ...
void ErrorAttribute (Token, Attribute)
int Token;
tScanAttribute * Attribute;
{
Attribute->Ident = NoIdent;
}
}
LOCAL {char String [256]; int L;}
DEFINE letter = {A-Z a-z}.
digit = {0-9}.
RULE
letter (letter | digit) * : {L = GetLower (String);
Attribute.Ident = MakeIdent (String, L); return TokIdentifier;}
Normalization of upper-case and lower-case letters to lower-case is done
by the predefined operation GetLower of Rex.
Scanner Specification 5
2.3. Character Constants
Problems to solve:
- conversion
- tab characters
Solution:
EXPORT {
# include "Positions.h"
typedef struct {
tPosition Position;
char Char;
} tScanAttribute;
extern void ErrorAttribute ();
}
GLOBAL {
# define TokCharConst ...
void ErrorAttribute (Token, Attribute)
int Token;
tScanAttribute * Attribute;
{
Attribute->Char = '\0';
}
}
RULE
'''' : {Attribute.Char = '\''; return TokCharConst;}
' \t ' : {Attribute.Char = '\t'; yyTab2 (1, 1); return TokCharConst;}
' ANY ' : {Attribute.Char = TokenPtr [1]; return TokCharConst;}
In this example the order of the rules is significant because the last
rule would also match the characters of the preceding one.
Scanner Specification 6
2.4. Strings
Problems to solve:
- conversion
- doubled delimiters
- tab characters
- unclosed strings (at end of lines)
- source position
Solution:
EXPORT {
# include "StringMem.h"
# include "Positions.h"
typedef struct {
tPosition Position;
tStringRef StringRef;
} tScanAttribute;
extern void ErrorAttribute ();
}
GLOBAL {
# define TokString ...
void ErrorAttribute (Token, Attribute) ...
}
LOCAL {char String [256]; int L;}
DEFINE StrCh = - {'\t\n}.
START string
RULE
#STD# ' : {yyStart (string); L = 0;}
#string# StrCh +:- {L += GetWord (& String [L]);}
#string# '' :- {String [L ++] = '\'';}
#string# ' :- {yyStart (STD); String [L] = '\0';
Attribute.StringRef = PutString (String, L);
return TokString;}
#string# \t :- {String [L ++] = '\t'; yyTab;}
#string# \n :- {Error ("unclosed string"); yyEol (0);
yyStart (STD); String [L] = '\0';
Attribute.StringRef = PutString (String, L);
return TokString;}
Scanner Specification 7
We presuppose the existence of a string memory module StringMem. The
procedure PutString stores a string in the string memory and returns a refer-
ence to it which can be used as attribute of the token TokString.
Scanner Specification 8
2.5. Keywords
Problems to solve:
- normalization of upper-case and lower-case letters
Solution:
GLOBAL {
# define TokAND ...
...
# define TokWITH ...
void ErrorAttribute (Token, Attribute) ...
}
DEFINE A = {Aa}.
...
Z = {Zz}.
RULE
A N D : {return TokAND ;}
...
W I T H : {return TokWITH ;}
The idea of the solution is to define identifiers A to Z to stand for the
corresponding upper-case as well as lower-case letters. Then specifying the
keywords in upper-case and spaced does the job.
Scanner Specification 9
3. Modula
3.1. Comments
Problems to solve:
- nested comments
- unclosed comments
- newline characters
- tab characters
Solution:
GLOBAL {VAR NestingLevel: CARDINAL;}
BEGIN {NestingLevel := 0;}
EOF {IF yyStartState = Comment THEN Error ("unclosed comment"); END;}
DEFINE CmtCh = - {*(\t\n}.
START Comment
RULE
#STD, Comment# "(*" :- {INC (NestingLevel); yyStart (Comment);}
#Comment# "*)" :- {DEC (NestingLevel);
IF NestingLevel = 0 THEN yyStart (STD); END;}
#Comment# "(" | "*" | CmtCh + :- {}
We need a variable NestingLevel to count the nesting depth of comments
because it is not possible to specify nested comments by a regular expression.
Comments are processed in a separate start state called Comment. Everything
is skipped in this state except opening or closing comment brackets which
trigger a change of the nesting level. The single characters '(' and '*' which
can start opening or closing comment brackets have to be skipped separately.
Otherwise comment brackets within comment would not be recognized because of
the "longest match" rule of Rex. An unclosed comment is indicated by reaching
end of file while in start state Comment. We presuppose the existence of a
procedure Error to report this condition. We don't need to care about tab and
newline characters other than excluding them from the set CmtCh because the
two rules needed for this problem are already predefined by Rex:
#Comment# \t :- {yyTab;}
#Comment# \n :- {yyEol (0);}
Scanner Specification 10
3.2. Strings
Problems to solve:
- conversion
- tab characters
- unclosed strings (at end of lines)
- source position
Scanner Specification 11
Solution:
EXPORT {
FROM StringMem IMPORT tStringRef;
FROM Positions IMPORT tPosition;
TYPE tScanAttribute = RECORD
Position : tPosition;
StringRef : tStringRef;
END;
PROCEDURE ErrorAttribute (Token: INTEGER; VAR Attribute: tScanAttribute);
}
GLOBAL {
FROM Strings IMPORT tString, AssignEmpty, Concatenate, Append;
FROM StringMem IMPORT PutString;
CONST TokString = ...;
PROCEDURE ErrorAttribute (Token: INTEGER; VAR Attribute: tScanAttribute);
BEGIN Attribute.StringRef := ...; END ErrorAttribute;
}
LOCAL {VAR String, S: tString;}
DEFINE StrCh1 = - {'\t\n}.
StrCh2 = - {"\t\n}.
START Str1, Str2
RULE
#STD# ' : {AssignEmpty (String); yyStart (Str1);}
#Str1# StrCh1+ :- {GetWord (S); Concatenate (String, S);}
#Str1# ' :- {yyStart (STD);
Attribute.StringRef := PutString (String);
RETURN TokString;}
#STD# \" : {AssignEmpty (String); yyStart (Str2);}
#Str2# StrCh2+ :- {GetWord (S); Concatenate (String, S);}
#Str2# \" :- {yyStart (STD);
Attribute.StringRef := PutString (String);
RETURN TokString;}
#Str1, Str2# \t :- {Append (String, 11C); yyTab;}
#Str1, Str2# \n :- {Error ("unclosed string"); yyEol (0); yyStart (STD);
Attribute.StringRef := PutString (String);
RETURN TokString;}
Again two separate start states are used to recognize the two forms of
Modula-2 strings. We presuppose the existence of a string handling module
Strings and a string memory module StringMem. The procedure PutString stores
a string in the string memory and returns a reference to it which can be used
as attribute of the token TokString.
Scanner Specification 12
4. C
4.1. Comments
Problems to solve:
- unclosed comments
- newline characters
- tab characters
Solution:
EOF {if (yyStartState == Comment) Error ("unclosed comment");}
DEFINE CmtCh = - {*\t\n}.
START Comment
RULE
"/*" :- {yyStart (Comment);}
#Comment# "*/" :- {yyStart (STD);}
#Comment# "*" | CmtCh + :- {}
Comments are processed in a separate start state called Comment. Every-
thing is skipped in this state except closing comment brackets which switch
back to start state STD. The single character '*' which can start a closing
comment bracket has to be skipped separately. Otherwise closing comment brack-
ets would not be recognized because of the "longest match" rule of Rex. An
unclosed comment is indicated by reaching end of file while in start state
Comment. We presuppose the existence of a procedure Error to report this con-
dition. We don't need to care about tab and newline characters other than
excluding them from the set CmtCh because the two rules needed for this prob-
lem are already predefined by Rex:
#Comment# \t :- {yyTab;}
#Comment# \n :- {yyEol (0);}
Scanner Specification 13
4.2. Character Constants
Problems to solve:
- conversion
- escape sequences
- tab characters
Solution:
EXPORT {
# include "Positions.h"
typedef struct {
tPosition Position;
char Char;
} tScanAttribute;
extern void ErrorAttribute ();
}
GLOBAL {
# define TokChar ...
void ErrorAttribute (Token, Attribute)
int Token;
tScanAttribute * Attribute;
{
Attribute->Char = '\0';
}
}
LOCAL {char String [256];}
RULE
' \t ' : {Attribute.Char = '\t'; yyTab2 (1, 1); return TokChar;}
' ANY ' : {Attribute.Char = TokenPtr [1]; return TokChar;}
' \\ n ' : {Attribute.Char = '\n'; return TokChar;}
' \\ t ' : {Attribute.Char = '\t'; return TokChar;}
' \\ v ' : {Attribute.Char = '\v'; return TokChar;}
' \\ b ' : {Attribute.Char = '\b'; return TokChar;}
' \\ r ' : {Attribute.Char = '\r'; return TokChar;}
' \\ f ' : {Attribute.Char = '\f'; return TokChar;}
' \\ {0-7}[1-3] ' : {(void) GetWord (String);
sscanf (String + 2, "%o", & Attribute.Char);
return TokChar;}
' \\ ANY ' : {Attribute.Char = TokenPtr [2]; return TokChar;}
In this example the order of the rules is significant because the second
rule would also match the characters of the first one. The same holds for the
group of following rules with respect to the last rule.
Scanner Specification 14
4.3. Strings
Problems to solve:
- conversion
- escape sequences
- tab characters
- strings ranging over several lines
- source position
Scanner Specification 15
Solution:
EXPORT {
# include "StringMem.h"
# include "Positions.h"
typedef struct {
tPosition Position;
tStringRef StringRef;
} tScanAttribute;
extern void ErrorAttribute ();
}
GLOBAL {
# define TokString ...
void ErrorAttribute (Token, Attribute) ...
}
LOCAL {char String [256], S [5]; int L;}
DEFINE StrCh = - {"\t\n\\}.
START string
RULE
#STD# \" : {yyStart (string); L = 0;}
#string# StrCh+ :- {L += GetWord (& String [L]);}
#string# \t :- {String [L ++] = '\t'); yyTab;}
#string# \\ n :- {String [L ++] = '\n');}
#string# \\ t :- {String [L ++] = '\t');}
#string# \\ v :- {String [L ++] = '\v');}
#string# \\ b :- {String [L ++] = '\b');}
#string# \\ r :- {String [L ++] = '\r');}
#string# \\ f :- {String [L ++] = '\f');}
#string# \\ {0-7}[1-3] :- {(void) GetWord (S);
sscanf (S + 1, "%o", & String [L ++]);}
#string# \\ ANY :- {(void) GetWord (S); String [L ++] = S [1];}
#string# \\ \n :- {yyEol (0); String [L ++] = '\n';}
#string# \" :- {yyStart (STD); String [L] = '\0';
Attribute.StringRef = PutString (String, L);
return TokString;}
#string# \n :- {Error ("unclosed string"); yyEol (0);
yyStart (STD); String [L] = '\0';
Attribute.StringRef = PutString (String, L);
return TokString;}
We presuppose the existence of a string memory module StringMem. The
procedure PutString stores a string in the string memory and returns a refer-
ence to it which can be used as attribute of the token TokString.
Scanner Specification 16
5. Ada
5.1. Identifiers
Problems to solve:
- normalization of upper-case and lower-case letters
Solution:
EXPORT {
# include "Idents.h"
# include "Positions.h"
typedef struct {
tPosition Position;
tIdent Ident;
} tScanAttribute;
extern void ErrorAttribute ();
}
GLOBAL {
# define TokIdentifier ...
void ErrorAttribute (Token, Attribute) ...
}
LOCAL {char String [256]; int L;}
DEFINE letter = {A-Z a-z}.
digit = {0-9}.
RULE
letter (_? (letter | digit)+ )* : {L = GetLower (String);
Attribute.Ident = MakeIdent (String, L); return TokIdentifier;}
Normalization of upper-case and lower-case letters to lower-case is done
by the predefined operation GetLower of Rex.
Scanner Specification 17
5.2. Numeric Literals
Problems to solve:
- conversion
Solution:
EXPORT {
# include "StringMem.h"
# include "Positions.h"
typedef struct {
tPosition Position;
tStringRef StringRef;
} tScanAttribute;
extern void ErrorAttribute ();
}
GLOBAL {
# define TokDecimalLiteral ...
# define TokBasedLiteral ...
void ErrorAttribute (Token, Attribute) ...
}
DEFINE digit = {0-9} .
extended_digit = digit | {A-F a-f} .
integer = digit (_? digit) * .
based_integer = extended_digit (_? extended_digit) * .
exponent = {Ee} {+\-} ? integer .
RULE
integer ("." integer) ? exponent ? :
{Attribute.StringRef = PutString (TokenPtr, TokenLength);
return TokDecimalLiteral;}
integer "#" based_integer ("." based_integer) ? "#" exponent ? :
{Attribute.StringRef = PutString (TokenPtr, TokenLength);
return TokBasedLiteral;}
The conversion of numeric literals to numeric values is not really solved
in the above solution. By storing the external representation of numeric
literals in a string memory the values are treated symbolically and true
conversion is delayed to be done by other compiler phases.
Scanner Specification 18
5.3. Character Literals
Problems to solve:
- no problems to solve for character literals
- distinction between character literals and apostrophes
Solution:
EXPORT {
# include "Idents.h"
# include "Positions.h"
typedef struct {
tPosition Position;
char Char;
tIdent Ident;
} tScanAttribute;
extern void ErrorAttribute ();
}
GLOBAL {
# define TokIdentifier ...
# define TokCharacterLiteral ...
# define TokApostrophe ...
# define TokLParenthesis ...
# define TokRParenthesis ...
void ErrorAttribute (Token, Attribute) ...
}
LOCAL {char String [256]; int L;}
DEFINE character = {\ -~}.
letter = {A-Z a-z}.
digit = {0-9}.
START QUOTE
RULE
#STD# ' character ' : {Attribute.Char = TokenPtr [1];
return TokCharacterLiteral;}
#QUOTE# ' : {yyStart (STD); return TokApostrophe;}
"(" : {yyStart (STD); return TokLParenthesis;}
")" : {yyStart (QUOTE); return TokRParenthesis;}
letter (_? (letter | digit)+ )*
: {yyStart (QUOTE); L = GetLower (Word);
Attribute.Ident = MakeIdent (Word, L);
return TokIdentifier;}
The tokens Character Literal and Apostrophe can be distinguished in Ada
only by consideration of some context. The pathological input is for example
Scanner Specification 19
something like
t'('a','b','c')
where t is a type_mark used as qualification for an aggregate of character
literals. It has to be taken care that 'a', 'b', and 'c' are recognized as
character literals and not '(', ',', and ','. Studying the Ada grammar one can
see that apostrophes are used following identifiers and closing parentheses
only. There are never character literals in this places.
This leads to the above solution with an additional start state called
QUOTE. After recognition of an identifier or a closing parentheses the
scanner is switched to start state QUOTE. After recognition of all other
tokens the scanner is switched back to start state STD. Apostrophes are
recognized only in start state QUOTE and character literals only in start
state STD. All the other tokens are recognized in both start states.
5.4. String Literals
Problems to solve:
- conversion
- doubled delimiters
- unclosed strings (at end of lines)
- source position
Scanner Specification 20
Solution:
EXPORT {
# include "StringMem.h"
# include "Positions.h"
typedef struct {
tPosition Position;
tStringRef StringRef;
} tScanAttribute;
extern void ErrorAttribute ();
}
GLOBAL {
# define TokStringLiteral ...
void ErrorAttribute (Token, Attribute) ...
}
LOCAL {char String [256]; int L;}
DEFINE StrCh = {\ !#-~}.
START string
RULE
#STD# \" : {yyStart (string); L = 0;}
#string# StrCh+ :- {L += GetWord (& String [L]);}
#string# \"\" :- {String [L ++] = '\"');}
#string# \" :- {yyStart (STD); String [L] = '\0';
Attribute.StringRef = PutString (String, L);
return TokStringLiteral;}
#string# \n :- {Error ("unclosed string"); yyEol (0);
yyStart (STD); String [L] = '\0';
Attribute.StringRef = PutString (String, L);
return TokStringLiteral;}
We presuppose the existence of a string memory module StringMem. The
procedure PutString stores a string in the string memory and returns a refer-
ence to it which can be used as attribute of the token TokString.
Scanner Specification 21
5.5. Keywords
Problems to solve:
- normalization of upper-case and lower-case letters
Solution:
GLOBAL {
# define TokABORT ...
...
# define TokXOR ...
void ErrorAttribute (Token, Attribute) ...
}
DEFINE A = {Aa}.
...
Z = {Zz}.
RULE
A B O R T : {return TokABORT ;}
...
X O R : {return TokXOR ;}
The idea of the solution is to define identifiers A to Z to stand for the
corresponding upper-case as well as lower-case letters. Then specifying the
keywords in upper-case and spaced does the job.
Scanner Specification 22
Appendix: Complete Scanner Specification for Ada
GLOBAL {
FROM Strings IMPORT tString, AssignEmpty, Concatenate, Append, Char;
FROM StringMem IMPORT tStringRef, PutString;
FROM Idents IMPORT tIdent, MakeIdent;
PROCEDURE ErrorAttribute (Token: INTEGER; VAR Attribute: tScanAttribute);
BEGIN END ErrorAttribute;
CONST
TokIdentifier = 1 ;
TokDecimalLiteral = 2 ;
TokBasedLiteral = 3 ;
TokCharLiteral = 4 ;
TokStringLiteral = 5 ;
TokArrow = 6 ; (* '=>' *)
TokDoubleDot = 7 ; (* '..' *)
TokDoubleStar = 8 ; (* '**' *)
TokBecomes = 9 ; (* ':=' *)
TokNotEqual = 10 ; (* '/=' *)
TokGreaterEqual = 11 ; (* '>=' *)
TokLessEqual = 12 ; (* '<=' *)
TokLLabelBracket = 13 ; (* '<<' *)
TokRLabelBracket = 14 ; (* '>>' *)
TokBox = 15 ; (* '<>' *)
TokAmpersand = 16 ; (* '&' *)
TokApostrophe = 17 ; (* ''' *)
TokLParenthesis = 18 ; (* '(' *)
TokRParenthesis = 19 ; (* ')' *)
TokStar = 20 ; (* '*' *)
TokPlus = 21 ; (* '+' *)
TokComma = 22 ; (* ',' *)
TokMinus = 23 ; (* '-' *)
TokDot = 24 ; (* '.' *)
TokDivide = 25 ; (* '/' *)
TokColon = 26 ; (* ':' *)
TokSemicolon = 27 ; (* ';' *)
TokLess = 28 ; (* '<' *)
TokEqual = 29 ; (* '=' *)
TokGreater = 30 ; (* '>' *)
TokBar = 31 ; (* '|' *)
TokABORT = 32 ; (* ABORT *)
TokABS = 33 ; (* ABS *)
TokACCEPT = 34 ; (* ACCEPT *)
TokACCESS = 35 ; (* ACCESS *)
TokALL = 36 ; (* ALL *)
TokAND = 37 ; (* AND *)
TokARRAY = 38 ; (* ARRAY *)
Scanner Specification 23
TokAT = 39 ; (* AT *)
TokBEGIN = 40 ; (* BEGIN *)
TokBODY = 41 ; (* BODY *)
TokCASE = 42 ; (* CASE *)
TokCONSTANT = 43 ; (* CONSTANT *)
TokDECLARE = 44 ; (* DECLARE *)
TokDELAY = 45 ; (* DELAY *)
TokDELTA = 46 ; (* DELTA *)
TokDIGITS = 47 ; (* DIGITS *)
TokDO = 48 ; (* DO *)
TokELSE = 49 ; (* ELSE *)
TokELSIF = 50 ; (* ELSIF *)
TokEND = 51 ; (* END *)
TokENTRY = 52 ; (* ENTRY *)
TokEXCEPTION = 53 ; (* EXCEPTION *)
TokEXIT = 54 ; (* EXIT *)
TokFOR = 55 ; (* FOR *)
TokFUNCTION = 56 ; (* FUNCTION *)
TokGENERIC = 57 ; (* GENERIC *)
TokGOTO = 58 ; (* GOTO *)
TokIF = 59 ; (* IF *)
TokIN = 60 ; (* IN *)
TokIS = 61 ; (* IS *)
TokLIMITED = 62 ; (* LIMITED *)
TokLOOP = 63 ; (* LOOP *)
TokMOD = 64 ; (* MOD *)
TokNEW = 65 ; (* NEW *)
TokNOT = 66 ; (* NOT *)
TokNULL = 67 ; (* NULL *)
TokOF = 68 ; (* OF *)
TokOR = 69 ; (* OR *)
TokOTHERS = 70 ; (* OTHERS *)
TokOUT = 71 ; (* OUT *)
TokPACKAGE = 72 ; (* PACKAGE *)
TokPRAGMA = 73 ; (* PRAGMA *)
TokPRIVATE = 74 ; (* PRIVATE *)
TokPROCEDURE = 75 ; (* PROCEDURE *)
TokRAISE = 76 ; (* RAISE *)
TokRANGE = 77 ; (* RANGE *)
TokRECORD = 78 ; (* RECORD *)
TokREM = 79 ; (* REM *)
TokRENAMES = 80 ; (* RENAMES *)
TokRETURN = 81 ; (* RETURN *)
TokREVERSE = 82 ; (* REVERSE *)
TokSELECT = 83 ; (* SELECT *)
TokSEPARATE = 84 ; (* SEPARATE *)
TokSUBTYPE = 85 ; (* SUBTYPE *)
TokTASK = 86 ; (* TASK *)
TokTERMINATE = 87 ; (* TERMINATE *)
TokTHEN = 88 ; (* THEN *)
TokTYPE = 89 ; (* TYPE *)
TokUSE = 90 ; (* USE *)
TokWHEN = 91 ; (* WHEN *)
Scanner Specification 24
TokWHILE = 92 ; (* WHILE *)
TokWITH = 93 ; (* WITH *)
TokXOR = 94 ; (* XOR *)
}
LOCAL {
VAR
String, S : tString ;
Word : tString ;
ident : tIdent ;
string : tStringRef ;
ch : CHAR ;
}
DEFINE
digit = {0-9} .
extended_digit = digit | {A-F a-f} .
letter = {a-z A-Z} .
character = {\ -~} .
stringch = {\ !#-~} .
integer = digit (_? digit) * .
based_integer = extended_digit (_? extended_digit) * .
illegal = - {\ \t\n} .
A = {Aa} .
B = {Bb} .
C = {Cc} .
D = {Dd} .
E = {Ee} .
F = {Ff} .
G = {Gg} .
H = {Hh} .
I = {Ii} .
J = {Jj} .
K = {Kk} .
L = {Ll} .
M = {Mm} .
N = {Nn} .
O = {Oo} .
P = {Pp} .
Q = {Qq} .
R = {Rr} .
S = {Ss} .
T = {Tt} .
U = {Uu} .
V = {Vv} .
W = {Ww} .
X = {Xx} .
Y = {Yy} .
Z = {Zz} .
START STRING, QUOTE
Scanner Specification 25
RULE
NOT #STRING# integer ("." integer) ? (E {+\-} ? integer) ?
: {yyStart (STD); GetWord (Word);
string := PutString (Word);
RETURN TokDecimalLiteral;}
NOT #STRING#
integer "#" based_integer ("." based_integer) ? "#" (E {+\-} ? integer) ?
: {yyStart (STD); GetWord (Word);
string := PutString (Word);
RETURN TokBasedLiteral;}
#STD# ' character ': {GetWord (String); ch := Char (String, 2);
RETURN TokCharLiteral;}
NOT #STRING# \" : {yyStart (STRING); AssignEmpty (String);}
#STRING# stringch + :- {GetWord (S); Concatenate (String, S);}
#STRING# \"\" :- {Append (String, '"');}
#STRING# \" :- {yyStart (STD); string := PutString (String);
RETURN TokStringLiteral;}
#STRING# \t :- {Append (String, 11C); yyTab;}
#STRING# \n :- {(* Error ("unclosed string"); *) yyEol (0);
yyStart (STD); string := PutString (String);
RETURN TokStringLiteral;}
NOT #STRING# "--" ANY * : {}
NOT #STRING# "=>" : {yyStart (STD); RETURN TokArrow ;}
NOT #STRING# ".." : {yyStart (STD); RETURN TokDoubleDot ;}
NOT #STRING# "**" : {yyStart (STD); RETURN TokDoubleStar ;}
NOT #STRING# ":=" : {yyStart (STD); RETURN TokBecomes ;}
NOT #STRING# "/=" : {yyStart (STD); RETURN TokNotEqual ;}
NOT #STRING# ">=" : {yyStart (STD); RETURN TokGreaterEqual ;}
NOT #STRING# "<=" : {yyStart (STD); RETURN TokLessEqual ;}
NOT #STRING# "<<" : {yyStart (STD); RETURN TokLLabelBracket ;}
NOT #STRING# ">>" : {yyStart (STD); RETURN TokRLabelBracket ;}
NOT #STRING# "<>" : {yyStart (STD); RETURN TokBox ;}
NOT #STRING# "&" : {yyStart (STD); RETURN TokAmpersand ;}
#QUOTE# "'" : {yyStart (STD); RETURN TokApostrophe ;}
NOT #STRING# "(" : {yyStart (STD); RETURN TokLParenthesis ;}
NOT #STRING# ")" : {yyStart (QUOTE); RETURN TokRParenthesis ;}
NOT #STRING# "*" : {yyStart (STD); RETURN TokStar ;}
NOT #STRING# "+" : {yyStart (STD); RETURN TokPlus ;}
NOT #STRING# "," : {yyStart (STD); RETURN TokComma ;}
NOT #STRING# "-" : {yyStart (STD); RETURN TokMinus ;}
NOT #STRING# "." : {yyStart (STD); RETURN TokDot ;}
NOT #STRING# "/" : {yyStart (STD); RETURN TokDivide ;}
NOT #STRING# ":" : {yyStart (STD); RETURN TokColon ;}
NOT #STRING# ";" : {yyStart (STD); RETURN TokSemicolon ;}
NOT #STRING# "<" : {yyStart (STD); RETURN TokLess ;}
NOT #STRING# "=" : {yyStart (STD); RETURN TokEqual ;}
Scanner Specification 26
NOT #STRING# ">" : {yyStart (STD); RETURN TokGreater ;}
NOT #STRING# "|" : {yyStart (STD); RETURN TokBar ;}
NOT #STRING# A B O R T : {yyStart (STD); RETURN TokABORT ;}
NOT #STRING# A B S : {yyStart (STD); RETURN TokABS ;}
NOT #STRING# A C C E P T : {yyStart (STD); RETURN TokACCEPT ;}
NOT #STRING# A C C E S S : {yyStart (STD); RETURN TokACCESS ;}
NOT #STRING# A L L : {yyStart (STD); RETURN TokALL ;}
NOT #STRING# A N D : {yyStart (STD); RETURN TokAND ;}
NOT #STRING# A R R A Y : {yyStart (STD); RETURN TokARRAY ;}
NOT #STRING# A T : {yyStart (STD); RETURN TokAT ;}
NOT #STRING# B E G I N : {yyStart (STD); RETURN TokBEGIN ;}
NOT #STRING# B O D Y : {yyStart (STD); RETURN TokBODY ;}
NOT #STRING# C A S E : {yyStart (STD); RETURN TokCASE ;}
NOT #STRING# C O N S T A N T : {yyStart (STD); RETURN TokCONSTANT ;}
NOT #STRING# D E C L A R E : {yyStart (STD); RETURN TokDECLARE ;}
NOT #STRING# D E L A Y : {yyStart (STD); RETURN TokDELAY ;}
NOT #STRING# D E L T A : {yyStart (STD); RETURN TokDELTA ;}
NOT #STRING# D I G I T S : {yyStart (STD); RETURN TokDIGITS ;}
NOT #STRING# D O : {yyStart (STD); RETURN TokDO ;}
NOT #STRING# E L S E : {yyStart (STD); RETURN TokELSE ;}
NOT #STRING# E L S I F : {yyStart (STD); RETURN TokELSIF ;}
NOT #STRING# E N D : {yyStart (STD); RETURN TokEND ;}
NOT #STRING# E N T R Y : {yyStart (STD); RETURN TokENTRY ;}
NOT #STRING# E X C E P T I O N : {yyStart (STD); RETURN TokEXCEPTION ;}
NOT #STRING# E X I T : {yyStart (STD); RETURN TokEXIT ;}
NOT #STRING# F O R : {yyStart (STD); RETURN TokFOR ;}
NOT #STRING# F U N C T I O N : {yyStart (STD); RETURN TokFUNCTION ;}
NOT #STRING# G E N E R I C : {yyStart (STD); RETURN TokGENERIC ;}
NOT #STRING# G O T O : {yyStart (STD); RETURN TokGOTO ;}
NOT #STRING# I F : {yyStart (STD); RETURN TokIF ;}
NOT #STRING# I N : {yyStart (STD); RETURN TokIN ;}
NOT #STRING# I S : {yyStart (STD); RETURN TokIS ;}
NOT #STRING# L I M I T E D : {yyStart (STD); RETURN TokLIMITED ;}
NOT #STRING# L O O P : {yyStart (STD); RETURN TokLOOP ;}
NOT #STRING# M O D : {yyStart (STD); RETURN TokMOD ;}
NOT #STRING# N E W : {yyStart (STD); RETURN TokNEW ;}
NOT #STRING# N O T : {yyStart (STD); RETURN TokNOT ;}
NOT #STRING# N U L L : {yyStart (STD); RETURN TokNULL ;}
NOT #STRING# O F : {yyStart (STD); RETURN TokOF ;}
NOT #STRING# O R : {yyStart (STD); RETURN TokOR ;}
NOT #STRING# O T H E R S : {yyStart (STD); RETURN TokOTHERS ;}
NOT #STRING# O U T : {yyStart (STD); RETURN TokOUT ;}
NOT #STRING# P A C K A G E : {yyStart (STD); RETURN TokPACKAGE ;}
NOT #STRING# P R A G M A : {yyStart (STD); RETURN TokPRAGMA ;}
NOT #STRING# P R I V A T E : {yyStart (STD); RETURN TokPRIVATE ;}
NOT #STRING# P R O C E D U R E : {yyStart (STD); RETURN TokPROCEDURE ;}
NOT #STRING# R A I S E : {yyStart (STD); RETURN TokRAISE ;}
NOT #STRING# R A N G E : {yyStart (STD); RETURN TokRANGE ;}
NOT #STRING# R E C O R D : {yyStart (STD); RETURN TokRECORD ;}
NOT #STRING# R E M : {yyStart (STD); RETURN TokREM ;}
NOT #STRING# R E N A M E S : {yyStart (STD); RETURN TokRENAMES ;}
NOT #STRING# R E T U R N : {yyStart (STD); RETURN TokRETURN ;}
Scanner Specification 27
NOT #STRING# R E V E R S E : {yyStart (STD); RETURN TokREVERSE ;}
NOT #STRING# S E L E C T : {yyStart (STD); RETURN TokSELECT ;}
NOT #STRING# S E P A R A T E : {yyStart (STD); RETURN TokSEPARATE ;}
NOT #STRING# S U B T Y P E : {yyStart (STD); RETURN TokSUBTYPE ;}
NOT #STRING# T A S K : {yyStart (STD); RETURN TokTASK ;}
NOT #STRING# T E R M I N A T E : {yyStart (STD); RETURN TokTERMINATE ;}
NOT #STRING# T H E N : {yyStart (STD); RETURN TokTHEN ;}
NOT #STRING# T Y P E : {yyStart (STD); RETURN TokTYPE ;}
NOT #STRING# U S E : {yyStart (STD); RETURN TokUSE ;}
NOT #STRING# W H E N : {yyStart (STD); RETURN TokWHEN ;}
NOT #STRING# W H I L E : {yyStart (STD); RETURN TokWHILE ;}
NOT #STRING# W I T H : {yyStart (STD); RETURN TokWITH ;}
NOT #STRING# X O R : {yyStart (STD); RETURN TokXOR ;}
NOT #STRING# letter (_? (letter | digit)+ )*
: {yyStart (QUOTE); GetLower (Word);
ident := MakeIdent (Word);
RETURN TokIdentifier;}
NOT #STRING# illegal : {IO.WriteS (IO.StdOutput, "illegal character: ");
yyEcho; IO.WriteNl (IO.StdOutput);}
References
[Gro87]
J. Grosch, Rex - A Scanner Generator, Compiler Generation Report No. 5,
GMD Forschungsstelle an der Universitat Karlsruhe, Dec. 1987.
Scanner Specification 1
Contents
1. Introduction .................................................... 2
2. Pascal .......................................................... 2
2.1. Comments ........................................................ 2
2.2. Identifiers ..................................................... 3
2.3. Character Constants ............................................. 5
2.4. Strings ......................................................... 6
2.5. Keywords ........................................................ 8
3. Modula .......................................................... 9
3.1. Comments ........................................................ 9
3.2. Strings ......................................................... 10
4. C ............................................................... 12
4.1. Comments ........................................................ 12
4.2. Character Constants ............................................. 13
4.3. Strings ......................................................... 14
5. Ada ............................................................. 16
5.1. Identifiers ..................................................... 16
5.2. Numeric Literals ................................................ 17
5.3. Character Literals .............................................. 18
5.4. String Literals ................................................. 19
5.5. Keywords ........................................................ 21
Appendix: Complete Scanner Specification for Ada ................ 22
References ...................................................... 27